Method for the location of a speaker and the acquisition of a voice message, and related system

ABSTRACT

A system for the detection and location of acoustic signals which can be used, for example, for the acquisition of voice messages or the like, in environments in which noises, echoes and reverberations are present. The system employs an array of microphones and is based on the Fourier antitransform calculus of only the information of phases of the normalised cross power spectrum of pairs of signals acquired from the microphones in the array. The system also enables an acoustic message cleared of the undesired components which are due to noises, echoes, etc to be reconstructed.

FIELD OF THE INVENTION

The present invention relates in general to methods and systems for theacquisition and processing of acoustic signals, such as for example themethods and systems for detecting, locating and reconstructing acousticsignals. Typical examples of applications of systems of this type arevoice acquisition and speaker location.

DESCRIPTION OF THE PRIOR ART

The acquisition of a voice message for the purposes of recognising,coding and verifying speakers, etc. is conventionally performed by theuse of a fixed ("head-mounted") microphone in front of the speaker orheld in the speaker's hand ("hand-held"). These devices havedisadvantages associated with the low signal/noise ratio and with thedependence of the performance of the system on the manner in which it isused (distance between the mouth and the microphone, knocks andvibrations, etc.). The use of an array of microphones can overcome someof these problems and also permits easier interaction between the userand the system.

The technical literature over the past ten years illustrates variousexamples of the use of arrays of microphones for the acquisition ofvoice messages.

Reference can be made, for example, to the articles "Some Analyses ofMicrophone Arrays for Speech Data Acquisition" by H. F. Silverman, IEEETrans. on Acoustics, Speech and Signal Processing, Vol. ASSP-35, no. 12,Dec. 1987 and "Computer-steered Microphone Arrays for Sound Transductionin Large Rooms" by J. L. Flanagan, J. D. Johnston, R. Zahn, G. W. Elko,J. Acoust. Soc. Am., 78(5), November, 1985, pp 1508-1518.

The acquisition of voice messages by means of an array of microphoneshas conventionally been achieved using techniques typical of theprocessing of underwater acoustic signals and radar signals, since theobject is to detect the position of the acoustic source by means of moresensors distributed about the space and to utilise this knowledge toimprove the ratio between useful signals and ambient noise.

At times, these techniques enable the information coming from the sourceto be extracted, without resorting to an express detection of itsposition (for example, beamforming techniques, LMS adaptive filtering:see, for example, the articles "Time Delay Estimation Using the LMSAdaptive Filter-Static Behaviour" by F. A. Reed, P. L. Feintuch, N. J.Bershad, IEEE Trans. on Acoustics, Speech and Signal Processing, Vol.ASSP-29, no. 3, June, 1981 and "On Time Delay Estimation InvolvingReceived Signals" by C. Y. Wuu, A. E. Pearson, IEEE Trans. on Acoustics,Speech and Signals Processing, vol. ASSP-32, no.4, August, 1976).

The problem of locating an acoustic source by the use of an array ofmicrophones is substantially due to the problem of measuring time delaysbetween the signals acquired from different sensors. When the relativedelays with which the sound wave has reached the different microphonesare known, the curve of the incident wave front emitted by the acousticsource can be reconstructed and traced back to its centre, at which thesource which produced it is assumed to be located.

The most widely used technique for estimating the relative delay betweentwo signals is based on finding the maximum of the cross-correlation:see, for example, the articles "An Algorithm for Determining TalkerLocation using a Linear Microphone Array and Optimal Hyperbolic Fit" byH. F. Silverman, Proc. Speech and Natural Language Workshop DARPA, June,1990, pp. 151-156, and "A Two-stage Algorithm for Determining TalkerLocation from Linear Microphone Array Data" by H. F. Silverman, S. E.Kirtman, Computer Speech and Language (1992) 6, pp. 129-152.

However, the efficacy of this method is largely influenced by thespectral content of the signals in question. For example, in the case ofnarrow-band signals (such as a whistle) or signals of high periodicity(such as a spoken sound), the estimation of the delay becomes criticalor even impossible in the presence of echoes and reverberations: inthese cases it is most efficient to attempt to extract the most usefulinformation for assessing the delay and thus the phase delay directly.

The phase of detecting an acoustic event consists in preprocessing thesignals acquired from the microphones, for determining the acousticallysignificant time segments on which a subsequent source-locatingoperation will be performed.

In the general case of sources of unknown and arbitrary acoustic eventsit is impossible to make assumptions a priori about the spectralcharacteristics of the signals emitted and the detection method cannotbe based on particular signal models.

The characterisation in terms of power of the acoustic signal is themost direct and simplest which can be taken into consideration forperforming the detection method: overcoming fixed or adjustablethresholds (dependent on the estimated noise level) can be sufficient incases in which the signal/noise ratio is not too low.

As said above, some conventional methods of processing signals acquiredby means of arrays of microphones enable an optimum signal to bereconstructed without the position of the acoustic source beingestimated beforehand; this signal can be considered equivalent to theinitial acoustic message, all the undesired acoustic components,attributable to secondary sources, being attenuated.

OBJECTS AND SUMMARY OF THE INVENTION

The object of the present invention is to provide a method and a systemfor the acquisition and processing of acoustic signals inherent in anacoustic event which enable the above disadvantages with respect to theprior art to be overcome or at least reduced.

In accordance with the present invention, this object is achieved bymeans of a method and a system having the characteristics indicated inthe claims following the present description.

More specifically, the solution according to the invention hascharacteristics of strength, speed of calculation, accuracy andinsensitivity to interference which are superior to the prior artsystems. Solutions of this type can be used for the acquisition of avoice message or other types of acoustic event and for their location.

The present invention provides for the use of at least one array ofmicrophones in a system enabling the acquisition of a general acousticmessage in a noisy environment to be improved.

The present invention also provides for the possibility of processinginformation extracted from the signals acquired by means of the array ofmicrophones, also enabling the speaker or the acoustic source whichproduced the message to be located.

Both the detection and the location of the message are performed, in anoriginal manner, using the phase information present in the normalisedcross-spectrum (estimated by means of a fast Fourier transform or FFT)relative to the signals acquired from a pair of microphones in thearray.

The successive derivation of a new version of the message, improved fromthe point of view of the useful signal/ambient noise ratio relative tothe single acquisitions attributed to each microphone in the array, isperformed on the basis of the information obtained during the phase inwhich the message itself is detected and located: thus, still usingsimply a linear combination of the signals from the microphones in thearray, suitably delayed, this method of reconstructing the signals isalso distinguished by the originality with which the informationrelating to the disphasing between the signals acquired via thedifferent microphones in the array is used.

What is to be understood by the term "array of microphones" in thepresent description and the following claims is a device composed of aplurality of microphones, preferably acting in all directions, which arealigned with respect to one another and at regular spacings from oneanother. Although it is not specifically mentioned in the followingdescription, it is in all cases also possible to perform the inventionwith other types of microphones spatially distributed in a differentmanner: for example, in the manner described in the article "An approachof Dereverberation Using Multi-Microphone Sub-Band Envelope Estimation"by H. Wang and F. Itakura, Proc. IEEE Int. Conf. on Acoust. SpeechSignal Processing, May, 1991, pp. 953-956.

It is self-evident that the expression "microphone" as used in thepresent context generally embraces all mechanical-electrical transducerswhich can convert an acoustic vibratory phenomenon (in which theultra-sounds are comprised) into a processable electrical signal.

It will thus be appreciated that the microphones are connected to ananalogue-to-digital conversion system operating at a sufficiently highsampling frequency (for example 24-48 kHz) synchronously between thevarious channels.

Specifically, in the present description reference is made to anembodiment using four microphones, although, theoretically, three wouldbe sufficient for locating the source; however, a larger number ofmicrophones can ensure that the system performs better.

The method described below refers in particular to the processing ofacoustic messages consisting of a preliminary detection of the eventitself, the accurate location of the position in which this event wasgenerated, and, finally, of an optional reconstruction of a version ofthe original message cleared of the noise and reverberation components,etc. In this way it is possible to consider using the module forlocating and/or detecting the acoustic event independently of the factthat the message then has to be converted into a version with optimumquality for the purposes of coding and voice recognition.

It can thus be assumed that the method and system according to theinvention operate efficiently on sounds having their origin in a zonewhich is spatially restricted and the corresponding acoustic pressurewave of which has particular directionality features, unlike backgroundnoise which is assumed to be diffused almost uniformly in theenvironment.

Thus, the present description does not take into consideration cases inwhich speakers (or generic acoustic sources) emit simultaneous messageshaving comparable dynamics and for which the method described would beintegrated (in a known manner) with methods for separating the sources.

In a particularly advantageous embodiment, the present inventionprovides for the use of a technique of estimating phase delays, such asthe one described in the article "The Generalized Correlation Method forEstimation of Time Delay" by C. H. Knapp, G. C. Carter, IEEE Trans. onAcoustics, Speech and Signal Processing, Vol. ASSP-24, no. 4, August,1976, never used previously in this area of acoustic analysis.

A technique of this type uses the Fourier antitransform of a version ofthe cross-spectrum of the two signals in which only the phaseinformation is maintained. Thus, amplitude information, which isirrelevant for measuring delays when the signal/noise ratio issufficiently high, is eliminated from the cross-spectrum of the signals.

The application to real signals acquired in a reverberating environmenthas demonstrated that the efficiency of this method is to a large extentindependent of the type of source to be located (voice, whistling,explosions, various types of noises). It is furthermore possible todiscriminate signals of a directional nature from other acousticphenomena of a different type (background noise, reverberations,resonance), even if they are of the same intensity. The cost in terms ofcomputation is comparable to that of the most efficientcross-correlation calculus and less than that of other delay estimatorsbased on adaptive filtering.

The present invention thus proposes a novel detection method based on afunction of coherence between pairs of signals exceeding a threshold,the same function also being used in the subsequent location phase. Afunction of this type represents an indication of reliability of thepresence of an acoustic event, of a duration which is also very shortand has obvious directionality features.

The invention further proposes a method which enables an optimum signal,such as linear combinations of the signals acquired by means ofmicrophones and disphased according to the estimation of the position ofthe source (or the delays between the various pairs) supplied by thelocating module, to be reconstructed.

The method and system according to the invention can be used mainly forthe acquisition of a voice message in a noisy environment, without theneed for the speaker to speak the message in front of the microphone. Ifthe acquisition environment is noisy and reverberating, the message iscleared of some of the undesired components. The message acquired inthis manner can then be supplied to a coding system (for teleconferenceor voice message applications) or to a voice recognition system.

DETAILED DESCRIPTION OF THE INVENTION

Further advantages and characteristics of the present invention willappear from the following description, given purely by way ofnon-limiting example and with reference to the appended drawings, inwhich:

FIG. 1 shows schematically the operating conditions of the systemaccording to the present invention,

FIG. 2 is a schematic block diagram of the system according to thepresent invention,

FIG. 3 is a schematic block diagram of part of the system according tothe present invention, and

FIG. 4 is a schematic block diagram of a block of the part of the systemillustrated in FIG. 3.

FIG. 1 illustrates schematically an environment in which the systemoperates. The acoustic source (speaker, generic sound sources, etc. thatis, the acoustic event which is to be detected) is indicated AS, whilstthe array of microphones consists of four microphones P₀, P₁, P₂, P₃shown aligned along an axis X.

The relative positions of the microphones and of the acoustic source areexpressed in the form of co-ordinates in a cartesian plane x, y. Theacoustic source AS emits wave fronts which are detected in differenttimes and ways at the different points in the spatial region in whichthey are distributed, the microphones in the array P₀, P₁, P₂, P₃ thusallowing the functions of the system to develop at different points.

FIG. 2 shows the general diagram of the system. The signals are acquiredby the use of four microphones P₀, P₁, P₂, P₃, acting in all directions,which are supposed to be equally spaced relative to one another (forexample, a 15 cm spacing between two adjacent microphones) and areconnected to four analogue-to-digital converters A/D₀, A/D₁, A/D₂, A/D₃operating at a given sampling frequency F_(c), of, for example, 48 kHz.The four outputs of these acquisition modules, indicated S₀, S₁, S₂, S₃(S_(i) in which i=0, . . . , 3), are connected to a processing modulegenerally indicated RLR (detection of the events, location of the sourceand reconstruction of the signal).

FIG. 3 shows the operating block diagram of the module RLR. At theinlet, the module RLR receives all the signals S_(i) (in which i=0, . .. , 3); the outputs of this module consist of a pair of co-ordinates Xand Y (if necessary with an angular co-ordinate Θ which identifies thedirection of the source AS), of a detection index d and of areconstructed signal RS.

In the following, the modules constituting the module RLR and theoperations they perform to obtain the said outputs will be described.

In practice, the module RLR can be constituted by an electronicprocessing device such as a minicomputer or by a specialised processorspecifically programmed for this purpose. The criteria for producing,programming and using computers and/or processors of this type are wellknown in the art and need not therefore be described herein.

The module RLR comprises a first series of modules EST₀, EST₁, EST₂,EST₃ (EST_(i), where i=0, . . . , 3) which convert the signals S_(i)(from the microphones P₀, P₁, P₂, P₃), received respectively at theinput, into numerical sampling frames and furthermore arrange thewindows for the frames obtained. The output of the modules EST thusconsists of the frames indicated x₀, x₁, x₂, x₃ respectively (x_(i)where i=0, . . . , 3).

A second series of modules, indicated CFFT₀, CFFT₁, CFFT₂, CFFT₃(CFFT_(i), where i=0, . . . , 3), the inputs of which are connected tothe respective outputs of the modules EST_(i), perform the fast Fouriertransform calculus (or FFT)--or optionally another integraltransform--for all the frames. The outputs of the modules CFFT_(i) inwhich i=0, . . . , 3 are designated X₀, X₁, X₂, X₃ (X_(i), where i=0, .. . , 3) respectively.

A third series of modules, indicated CS₁, CS₂, CS₃, (CS_(i), in whichi=1, . . . , 3), calculates the cross-spectra, or normalised cross(power) spectra estimated by the use of an FFT (Fast Fourier Transform),between pairs of frames. Each of the modules CS_(i) in fact receives asinput the outputs of two modules of the preceding series, that is, ofthe modules CFFT_(i). In particular, each module CS_(i) receives asinput the output X_(i) of the corresponding module CFFT_(i) and then theoutput X₀ of the module CFFT₀.

In this way, the modules CS_(i) calculate the normalised cross-spectrumof the pairs of frames (X₀, X₁), (X₀, X₂), (X₀, X₃) extracted from thesignals S₀, S₁, S₂, S₃. The modules CS_(i) furthermore calculate theinverse FFTs of the normalised cross-spectra. The outputs of the modulesCS_(i) consist of the signals C₁, C₂, C_(a) (C_(i), where i=1, . . . ,3) respectively.

A fourth series of modules, indicated ICM₁, ICM₂, ICM₃ (ICM_(i) wherei=1, . . . , 3), interpolates the signals C₁, C₂, C₃, obtained in thismanner, and searches for their time maxima. The outputs of the modulesICM_(i) are provided by the pairs of signals M₁ and δ₁, M₂ and δ₂, M₃and δ₃.

A module RIL performs the detection function on the basis of the signalsM₁, M₂, M₃. The output of the module RIL is the signal d.

A module LOC performs the location function, that is, determining thedirection Θ from which the wave front arrives and calculating theco-ordinates (X, Y) of the source. The module LOC operates on the basisof the signals δ₁, δ₂, δ₃ and emits the signal Θ and the pair ofco-ordinates X, Y at the output.

A module RIC performs the reconstruction function, that is, constructinga new version of the acoustic message represented by the signal emittedat the output RS. The module RIC operates on the basis of the inputsignals δ₁, δ₂, δ₃ and S₀, S₁, S₂, S₃.

The various modules constituting the system according to the presentinvention and the operations they perform will now be described in moredetail module by module.

Modules EST_(i)

For each signal S, each module EST_(i) extracts respective frames x_(i)of a length t_(f) ms, corresponding to N samples, with an analysingpitch of t_(a) ms. Each frame is then weighted with a Blackman windowdefined in the method described in "Digital Signal Processing" by A. V.Oppenheim, R. W. Schafer, Prentice Hall 1975. The use of the Blackmanwindow has proved more effective for the purposes of the presentinvention than the use of a conventional Hamming window.

Modules CFFT_(i)

The modules CFFT_(i) receive as input the frames x_(i) of N samples,extracted from the signals S and weighted as described above. The framesthen undergo an FFT to produce a complex sequence X_(i) of N components.One possible calculation of the FFT is described for example in theabove-mentioned article by Oppenheim. The embodiment described is set upsuch that Fc=48 kHz, N=1024 (and consequently t_(f) =21.33) and t_(a)=t_(f) /2=10.66. It will be appreciated that the above values need notbe interpreted in a strictly limitative sense. They are neverthelessindicative of the respective orders of magnitude in which parameters ofthis type are selected.

Modules CS_(i)

In practice all modules CS_(i) comprise three submodules, shown in FIG.4 for better understanding.

A first submodule X-SP calculates the cross-spectrum of a pair ofcomplex sequences X₀, X_(i). A second submodule NORM normalises theabovementioned cross-spectrum calculated by the submodule X-SPgenerating a complex vector Y_(i) at the output. Finally, a thirdsubmodule CFFT⁻¹ performs an inverse FFT of the said vector Y_(i).

These operations, described briefly above, will now be described infurther detail, particularly as regards the mathematical aspect.

For each analysis moment t, for each pair of sequences (X₀, X₁), (X₀,x₂), (X₀, X₃) the vector P_(j) of N components is calculated and definedas:

    ρ.sub.i =FFT.sup.-1 [Y.sub.j ]

when j=1, 2, 3, where the l-th generic complex component of the vectorY_(j) is defined as: ##EQU1## which X_(j) * indicates the conjugatecomplex vector of the vector X_(j).

The components ρ_(j) (i) of the vector ρ_(j) express a measure ofcoherence between the original signal frames when the relative delayτ_(i) is equal to i sampling intervals. A positive delay k/F_(c)corresponds to the k-th generic component of the first half of thevector (components from index 0 to index N/2-1); a negative delay (or aleader) equal to (N-k)/F_(c) corresponds to the k-th generic componentof the second half of the vector (components from index N/2 to indexN-1).

In ideal conditions, in which the two signals are equal except for ascale factor and a delay τ₀, equal to a whole number of samplingintervals, a sequence ρ_(j) consisting of a pulse centred on thecomponent corresponding to the delay τ₀ would be obtained. In practice,ρ_(j) (i) can be interpreted as an index of coherence between the frameX₀ and the frame obtained by disphasing X_(j) of a number of samplescorresponding to the delay τ_(i) =i/F_(c), or, in the case of a fixedacoustic source, as an index of coherence between the signal S₀ and thesignal S_(j) disphased by τ_(j). The components of the vector ρ arenormalised between 0 and 1. As defined above, the analysis performed onthe frames every t_(a) ms leads to the determination of three coherencefunctions C₁ (t, τ), C₂ (t, τ), C₃ (t, τ) consisting at any momentt=n·t_(a) of the vectors ρ₁, ρ₂, ρ₃, respectively.

Modules ICM_(i)

In order to render the abovementioned coherence information moredetailed, each vector ρ_(j) is reprocessed in the modules ICM by meansof an interpolation and filtering operation. In this way the estimationof the delay between two signals can be made more accurate.

In practice, as a result of the function C_(j) (t, τ) being applied tothe vector _(j) at any moment t=n·t_(a) of an operation (described, forexample, in the article "Optimum FIR Digital Filter Implementation forDecimation, Interpolation and Narrow Band Filtering" by R. E. Crochiere,L. R. Rabiner, IEEE Trans. on Acoustics, Speech and Signal Processing,Vol, ASSP-23, no. 5, pp. 444-456, October, 1975), a new coherencefunction C'_(j) (t, τ') is obtained in which the discrete variable τ'has a larger resolution than the discrete variable τ.

For each coherence function C'_(j) (t, τ') a search is then performed atany moment t=n·t_(a), for the maximum of the function itself, when thedelay τ' is varied (in practice, the position of this maximum expressesthe phase information present in the cross-spectra calculated above).The maximum of this function when τ' is varied is defined as M_(j) (t)and when j=1,2,3: ##EQU2## and the delay τ'_(max) corresponding theretois defined as δ_(j) (t).

Module RIL: Detection

The detection of the acoustic event is based at any moment t on thevalues M₁ (t), M₂ (t), M₃ (t). A detection index d(t) such as:

    d(t)=max[M.sub.1 (t),M.sub.2 (t),M.sub.3 (t)]

is derived from these functions.

Whenever this index exceeds an empirically predefined threshold S_(d),for example in the present embodiment the set up is such that S_(d)=0.7, an acoustic event is considered to be initiated. The event isconsidered to be terminated when the said index returns below thisthreshold.

Module LOC: Location

The location operation of the acoustic source is performed in any timeinterval in which detection has provided a positive result (see FIG. 1).

At any moment t, the value δ_(j) (t) can be returned to the direction inwhich the wave front arrived, with respect to the centre of the pair ofmicrophones (O, j): this direction can be expressed, in angular terms,as:

    Θ.sub.j (t)=arccos(vδ.sub.j (t)/d.sub.j)

in which v is the speed of the sound and d_(j) is the distance betweenthe microphone P₀ and the microphone P_(j). For any moment t, adirection Θ_(j) (t), corresponding to the delay δ_(j) (t), is associatedwith each pair of microphones (O, j) .

This modeling is based on the assumption that the acoustic pressure wavehas reached the array in the form of a flat wave. The assumption is nolonger valid in the case in which the source is a short distance awayfrom the array.

In this case, which is the one in which the embodiment described isused, the possible points which may give rise to the acoustic event inquestion plot a branch of a hyperbola which has its focus in theposition of one of the two microphones. The use of four microphones, andthus of three pairs, enables three branches of a hyperbola to bedetermined, the intersections of which delimit the area inside which thesource should be located.

The following procedure is used to calculate the intersection betweentwo branches of a hyperbola, for example, corresponding to the pairs(0, 1) and (0, 2).

With the co-ordinates of the microphones 0, 1, 2 being set as p₀, p₁,p₂, along the axis of the array and the delays estimated by each pairbeing indicated as δ₀₁ and δ₀₂, the co-ordinates of the point ofintersection are given as: ##EQU3##

The co-ordinates x_(p13), y_(p13), x_(p23), y₂₃ of the points ofintersection between the other two pairs of branches of a hyperbola aredetermined in a similar manner.

The co-ordinates (x, y) of the acoustic source are derived from thesethree points, as the barycentre of the triangle of which they form thevertices.

Module RIC: Reconstruction

The reconstruction of the signals on the basis of the signals s₀ (t), s₁(t), s₂ (t), s₃ (t) and of the delays δ₁ (t), δ₂ (t), δ₃ (t),respectively between the pairs of signals (0, 1), (0, 2), (0, 3) isbased on a modeling of the desired signal, of the following type:

Using this modeling, the array can be "directed" at any moment towardsthe position determined from the given delays.

It will be appreciated that, as the principle of the invention remainsthe same, the details of construction and forms of embodiment may varywidely with respect to those described and illustrated, without therebydeparting from the scope of the present invention.

What is claimed is:
 1. A method for the acquisition and processing ofacoustic signals inherent in an acoustic event manifested in a givenspatial region, the method comprising the operations of:acquiring thesaid acoustic signals at a plurality of different points in the saidspatial region, generating from the said acoustic signals first signalsindicative of cross-spectra for a plurality of pairs of the saidacoustic signals, extracting phase information present in the saidcross-spectra for the purposes of acquisition and/or processing, andlocating at any moment the said acoustic event on the basis of delayscalculated on the basis of the estimation of signals obtained byantitransformation of the said first signals.
 2. A method according toclaim 1, including the operation of reconstructing the said acousticevent using the said acoustic signals in conjunction with delayscalculated on the basis of the estimation of the said first signals. 3.A method according to claim 2, including basing the said reconstructionof the acoustic event on a modeling of the acoustic signal to bereconstructed substantially according to the formula:

    s(t)=a.sub.0 s.sub.0 (t)+a.sub.1 s.sub.1 (t+δ.sub.1 (t))+a.sub.2 s.sub.2 (t+δ.sub.2 (t))+a.sub.3 s.sub.3 (t+δ.sub.3 (t))

in which s(t) is the said acoustic signal to be reconstructed, so(t), s₀(t), s₁ (t), s₂ (t), s₃ (t) are the said acoustic signals δ₁ (t), δ₂(t), δ₃ (t) are the said delays, and a₀, a₁, a₂, a₃ are numericalcoefficients.
 4. A method according to claim 1, wherein the saidacoustic signals are converted into digital format after measurement. 5.A method according to claim 4, wherein the said conversion into digitalformat occurs at a given sampling frequency which is higher than afrequency band of the said acoustic event.
 6. A method according toclaim 1, wherein the operation for generating the said first signals onthe basis of the said acoustic signals comprises the phasesof:extracting sampling frames from the said acoustic signals,calculating an integral transform from the said frames, calculatingcross power spectra for a plurality of pairs of the integral transformof the said frames, calculating an antitransform of the said cross powerspectra.
 7. A method according to claim 6, wherein the phase forextracting the frames comprises the phases of:extracting frames havingpredetermined lengths t_(f), corresponding to a predetermined number Nof samples, with a pitch t_(a), weighting the said frames by means of awindow.
 8. A method according to claim 7, wherein the said window is aBlackman window.
 9. A method according to claim 6, wherein when asampling frequency F_(c) =48 kHz, N is selected such that it is of theorder of 1024 and t_(f) is of the order of 21.33 ms and t_(a) is of theorder of t_(f) /2=10.66 ms.
 10. A method according to claim 6, whereinthe integral transform of the frames is a Fourier transform.
 11. Amethod according to claim 10, characterized in that the Fouriertransform is a fast Fourier transform or FFT.
 12. A method according toclaim 6, wherein the said cross power spectra are normalised cross powerspectra.
 13. A method according to claim 5, wherein the phase ofcalculating the cross power spectra comprises:the phase of calculatingfor each of said pairs of the transform a vector ρ_(i) having ncomponents substantially in accordance with the formula

    ρ.sub.i =FFT.sup.-1 [Y.sub.j ]

when j=1, 2, 3, the pairs being X₀, X₁ ; X₀, X₂ ; X₀, X₃ ; and the l-thcomplex generic component of the vector Y_(j) being defined as: ##EQU4##in which X_(j) * is the conjugate complex vector of the vector X_(j).14. A method according to claim 13, characterized in that the componentsof the vector ρ_(j) are normalised.
 15. A method according to claim 1,wherein the operation for generating the said first signals on the basisof the said acoustic signals comprises the phases of:extracting samplingframes from the said acoustic signals, calculating an integral transformfrom the said frames, calculating cross power spectra for a plurality ofpairs of the integral transform of the said frames, calculating anantitransform of the said cross power spectra,wherein the phase ofcalculating the cross power spectra comprises: the phase of calculatingfor each of the pairs a vector ρ_(i) having n components substantiallyin accordance with the formula

    ρ.sub.i =FFT.sup.-1 [Y.sub.j ]

when j=1, 2, 3, the pairs being X₀, X₁ ; X₀, X₂ ; X₀, X₃ ; and the l-thcomplex generic component of the vector Y_(j) being defined as: ##EQU5##in which X_(j) * is the conjugate complex vector of the vector X_(j),and wherein the method includes the phase of estimating the relativedelay between pairs of frames of signals which comprises the phase ofusing the vector ρ_(j) to calculate an index of coherence between theframe x₀ and a frame obtained by disphasing the frame x_(j) by a numberof samples i corresponding to a delay τ_(i) =i/F_(c), equivalent to anindex of coherence between the acoustic signal S₀ and the acousticsignal S_(j) disphased by a delay τ_(i).
 16. A method according to claim12, wherein the first signals comprise coherence functions C_(j) (t, τ)consisting of the vectors ρ_(j) respectively.
 17. A method according toclaim 6, wherein the sample frames are extracted in pairs eachcomprising a first frame present in each pair, and a second frameselected from the frames which are different from the first frame commonto all the pairs such that there is one pair for each of the framesdifferent from the said first frame.
 18. A method according to claim 6,wherein the said antitransform is an inverse Fourier transform.
 19. Amethod according to claim 18, wherein the said inverse Fourier transformis an inverse fast Fourier transform or FFT.
 20. A method according toclaim 1, wherein the said first signals are estimated by means offiltering and interpolation.
 21. A method according to claim 20, whereinthe filtering of said first signals is activated by the use of at leastone finite impulse response filter or FIR.
 22. A method according toclaim 1, wherein the said first signals are submitted to an operationfor the search of the maximum of the first signals to generate secondsignals.
 23. A method according to claim 12, wherein the first signalscomprise coherence functions C_(j) (t, τ) consisting of the vectorsρ_(j) respectively, wherein the said first signals are submitted to anoperation for the search of the maximum of the first signals to generatesecond signals, and wherein the phase of searching for the maximumcomprises the phases of:searching for the maximum of the said coherencefunctions, filtrated and interpolated, C_(j) ' (t, τ) when a delay τ' isvaried, generating functions M_(j) (t) defined substantially accordingto the formula

    M.sub.j (t)=maxC.sub.j '(t,τ') τ'

when t is varied, and calculating the delays (δ₁, δ₂, δ₃) as delaysδ_(j) (t)=τ'_(max) corresponding to the functions M_(j) (t).
 24. Amethod according to claim 1, wherein the phase of detecting the saidacoustic event comprises the phases of:generating a detection signal onthe basis of the said second signals, detecting that the detectionsignal has passed a predetermined threshold.
 25. A method according toclaim 12, wherein the first signals comprise coherence functions C_(j)(t, τ) consisting of the vectors ρ_(j) respectively, the said firstsignals are submitted to an operation for the search of the maximum ofthe first signals to generate second signals, wherein the phase ofsearching for the maximum comprises the phases of:searching for themaximum of the coherence functions, filtrated and interpolated, C_(j)'(t, τ') when a delay τ' is varied, generating functions M_(j) (t)defined substantially according to the formula ##EQU6## when t isvaried, and calculating the delays (δ₁, δ₂, δ₃) as delays δ_(j)(t)=τ'_(max) corresponding to the functions M_(j) (t),wherein the phaseof detecting the said acoustic event comprises the phases of: generatinga detection signal on the basis of the said second signals, detectingthat the detection signal has passed a predetermined threshold,andwherein the said detection signal is substantially generatedaccording to the formula

    d(t)=max[M.sub.1 (t),M.sub.2 (t),M.sub.3 (t)]

in which d(t) is the said detection signal.
 26. A method according toclaim 1, wherein the method comprises the operation of locating at anymoment the said acoustic event on the basis of delays calculated on thebasis of the estimation of signals obtained by antitransformation of thesaid first signals,wherein the operation for generating the said firstsignals on the basis of the said acoustic signals comprises the phasesof: extracting sampling frames from the said acoustic signals,calculating an integral transform from the said frames, calculating anantitransform of the said cross power spectra, andwherein the operationfor locating the said acoustic event comprises the phases of:calculating a branch of a hyperbola having its focus at one of the twodetection points, for any pair of detection points corresponding to thepairs of frames, calculating an area, defined by the said branches of ahyperbola, inside which the acoustic event is located.
 27. A methodaccording to claim 1, wherein the operation of generating said firstsignals on the basis of said acoustic signals comprises the phasesof:calculating an integral transform from the said acoustic signals, andcalculating cross power spectra for a plurality of pairs of the integraltransform, wherein the said cross power spectra are normalized crosspower spectra.
 28. A system for the acquisition and processing ofacoustic signals inherent in an acoustic event manifested in a givenspatial region, comprising:means for acquiring the said acoustic signalsat a plurality of different points in the said spatial region, means forgenerating from the said acoustic signals first signals indicative ofcross-spectra for a plurality of pairs of the said acoustic signals,means for extracting phase information present in the saidcross-spectra, including: means for locating at any moment the saidacoustic event on the basis of delays calculated on the basis of delayscalculated on the basis of the estimation of signals obtained byantitransformation of the said first signals.